The Memory logP Model of Local Communication

نویسندگان

  • Kirk W. Cameron
  • Xian - He Sun
چکیده

1 Abstract—Data movement across a memory hierarchy can severely impact application execution time. For example, on the fast interconnect of the Origin 2000 three-and four-fold increases in communication cost for small message transmissions (~1K) stored non-contiguously are not uncommon. Simple, accurate predictions of communication time in hierarchical memories will identify bottlenecks in communication performance during algorithm design. We present a simple and useful model of point-to-point memory communication inspired by LogP to predict and analyze the latency of memory copy, pack and unpack for varying memory access patterns. We use our model to isolate the contributions of hardware, middleware, and software to data transfers on Intel-and MIPS-based platforms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models and Resource Metrics for Parallel and Distributed Computation

This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reeects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the diierent resource metrics chosen by diierent parallel models, categorizing the models into four classes: the basi...

متن کامل

Models and Resource Metrics for Parallel and Distributed Computationt

This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different resource metrics chosen by different parallel models, categorizing the models into four classes: the b...

متن کامل

Hiding Communication Costs in Bandwidth-Limited Parallel FFT Computation

This paper presents a novel computation schedule for FFT-type computations on a bandwidth-limited parallel computer. Using P processors, we are able to process an n-input FFT graph in the optimal time of n logn P by carefully interleaving interprocessor communication steps with local computation. Our algorithm is suitable for both shared-memory and distributed memory machines and is analyzed in...

متن کامل

Further Results with Algorithmic Skeletons for the CLUMPS Model of Parallel Computation

The CLUMPS (Campbell's Lenient, Uniied Model of Parallel Systems) model of parallel computation is composed of an architectural model with an associated cost model. The architectural model employs a multi-level memory hierarchy, so requires general locality of communication (communication between close processors). The multi-level memory hierarchy is reeected in the cost model which is based on...

متن کامل

Programming Data-parallel { Executing Process-parallel

Most theoretical work is based on the PRAM-model which has a block of shared memory and executes in a synchronous lock-step mode. Real hardware usually executes asynchronously and uses local memory and message passing. The recent LogP-model reeects these architectural properties. We show that for a practically important subclass of PRAM-programs it is possible to transform them into LogP-progra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002